fix: prevent duplicate daily recap notifications via atomic SETNX lock (#4594) by beastoin · Pull Request #4646 · BasedHardware/omi

beastoin · 2026-02-07T02:25:58Z

Summary

Replace non-atomic check-then-set (has_daily_summary_been_sent / set_daily_summary_sent) with atomic try_acquire_daily_summary_lock using Redis SETNX — lock is acquired BEFORE the expensive LLM call, not after
Move in-function imports (generate_comprehensive_daily_summary, daily_summaries_db) to module top-level per codebase rules
Remove unused set_daily_summary_sent / has_daily_summary_been_sent functions from redis_db.py

Root cause

Cloud Scheduler triggers the notification cron every 1 minute. The ~3 minute LLM generation gap between has_daily_summary_been_sent check (line 119) and set_daily_summary_sent set (line 162) allowed multiple job instances to pass the check before any completed. Per @mon's BQ analysis, ~51% of users received duplicate daily summaries, roughly doubling the daily_summary LLM cost.

Fix

Same pattern as try_acquire_listen_lock() (redis_db.py:744) — r.set(key, '1', ex=ttl, nx=True) returns True only for the first caller. All subsequent callers get None and return early.

Test plan

9 new unit tests pass (SETNX behavior, concurrent access, lock integration)
Full backend/test.sh passes (all existing tests unaffected)
Monitor BQ daily_summary call distribution after deploy — expect ~100% single-call per user

🤖 Generated with Claude Code

) Replace non-atomic set_daily_summary_sent/has_daily_summary_been_sent with try_acquire_daily_summary_lock using Redis SETNX to prevent race condition where multiple cron instances pass the check before any completes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace has_daily_summary_been_sent check with try_acquire_daily_summary_lock, move in-function imports to module top-level, and remove post-LLM set_daily_summary_sent call (lock now handles dedup). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

9 tests covering SETNX lock behavior, concurrent access simulation, and _send_summary_notification lock integration (skip/proceed/no-convos). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request effectively addresses a critical race condition that caused duplicate daily summary notifications. The fix correctly replaces a non-atomic check-then-set logic with an atomic SETNX operation in Redis, ensuring that only one job instance can proceed with the expensive LLM call. The changes are clean, and the removal of unused functions and moving of imports improve code quality. The addition of a new test suite for this change is commendable and covers various scenarios, including concurrency. However, there is a critical flaw in the new test file backend/tests/unit/test_daily_summary_race_condition.py: it tests a copy of the try_acquire_daily_summary_lock function instead of the actual production code. This undermines the validity of the unit tests for the lock function itself. I've added a critical review comment with a suggestion on how to fix this to ensure the tests are robust and correctly validate the implementation.

gemini-code-assist · 2026-02-07T02:27:36Z

backend/tests/unit/test_daily_summary_race_condition.py

+def try_acquire_daily_summary_lock(uid: str, date: str, ttl: int = 60 * 60 * 2) -> bool:
+    result = mock_r.set(f'users:{uid}:daily_summary_lock:{date}', '1', ex=ttl, nx=True)
+    return result is not None
+
+
+redis_db_mod.try_acquire_daily_summary_lock = try_acquire_daily_summary_lock


This test is testing a local, copy-pasted version of try_acquire_daily_summary_lock, not the actual implementation from database/redis_db.py. This means that if the logic in the original file changes, this test might still pass, giving a false sense of security. The test should import and test the production code directly.

To fix this, you should import the actual function and mock its dependencies. Given the extensive module stubbing in this file, a robust way to achieve this is to patch the redis.Redis client at the top of the file before any application modules are imported. This will prevent the real Redis client from being instantiated.

Example of how you could refactor the test setup:

# At the top of your test file from unittest.mock import MagicMock, patch mock_r = MagicMock() patch('redis.Redis', return_value=mock_r).start() # Now you can safely import from your application modules # You might need to adjust the existing module stubbing from database.redis_db import try_acquire_daily_summary_lock # ... class TestTryAcquireDailySummaryLock: """Tests for the atomic SETNX lock function.""" def test_lock_acquired_returns_true(self): mock_r.reset_mock() mock_r.set.return_value = True assert try_acquire_daily_summary_lock('uid1', '2026-02-07') is True mock_r.set.assert_called_with('users:uid1:daily_summary_lock:2026-02-07', '1', ex=7200, nx=True) # ... other tests for try_acquire_daily_summary_lock

This change is critical to ensure the tests are correctly validating the production code.

- test_redis_error_propagates_no_silent_swallow: ConnectionError not silently swallowed - test_utc_fallback_still_acquires_lock: user_data without timezone still hits lock Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin · 2026-02-07T02:32:51Z

Test results:

bash backend/test.sh — pass (227 tests, 0 failures, 1 Pydantic deprecation warning)

11 new tests added (test_daily_summary_race_condition.py):

#	Test	Category
1	`test_lock_acquired_returns_true`	SETNX success path
2	`test_lock_already_held_returns_false`	SETNX failure path
3	`test_custom_ttl`	TTL parameter forwarding
4	`test_different_users_get_separate_locks`	Key isolation
5	`test_different_dates_get_separate_locks`	Key isolation
6	`test_concurrent_lock_attempts_only_one_wins`	5-thread race simulation
7	`test_redis_error_propagates_no_silent_swallow`	Transient error (CP8)
8	`test_skips_when_lock_not_acquired`	Integration: lock denied
9	`test_proceeds_when_lock_acquired`	Integration: lock granted
10	`test_no_conversations_skips_llm`	Integration: empty convos
11	`test_utc_fallback_still_acquires_lock`	Boundary: no timezone (CP8)

Please review when ready.

by AI for @beastoin

beastoin · 2026-02-07T04:33:47Z

Integration Test Confirmation

Ran the notification job against dev environment to validate the SETNX race condition fix end-to-end.

Setup

Backend started with dev env config
Production entry point: python job.py (matches Dockerfile.notifications_job CMD)

Run 1 — Lock Acquisition

9 users matched local hour 22 (UTC trigger)
All 9 acquired daily_summary_lock via SETNX
Summaries generated and notifications sent normally

Run 2 — Duplicate Prevention (simulating overlapping cron)

Same 9 users matched again
All 9 were blocked by existing locks — 0 duplicate processing
No LLM calls, no duplicate notifications sent

Redis Verification

SCAN 0 MATCH *daily_summary_lock* COUNT 1000
→ 9 keys found (one per user)
TTL on each key: ~7200s (2h, as configured)

Conclusion

The atomic SETNX lock prevents duplicate daily summaries when overlapping cron instances hit the same user. Before this fix, ~51% of users received duplicates (per @mon's BQ analysis in #4594). The race window between check-and-set (~3 min for LLM work) is now eliminated — the lock is acquired before any expensive work begins.

Integration test by AI for @beastoin

beastoin

lgtm

beastoin · 2026-02-07T07:21:05Z

@mon PR merged — can you trigger the notification job deployment?

gh workflow run gcp_notifications_job.yml -f environment=prod -f branch=main

by AI for @beastoin

beastoin · 2026-02-07T07:36:49Z

Deployment status: Live in prod.

Deployed via gcp_notifications_job.yml (environment=prod, branch=main) — confirmed by @mon.

@mon can you check BQ ~24h after deployment to verify duplicate daily summaries are no longer occurring? The previous analysis showed ~51% duplication rate, so a clean day would confirm the fix.

by AI for @beastoin

beastoin and others added 4 commits February 7, 2026 02:25

test: add unit tests for daily summary race condition fix (#4594)

39a1e16

9 tests covering SETNX lock behavior, concurrent access simulation, and _send_summary_notification lock integration (skip/proceed/no-convos). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: add daily summary race condition test to test.sh (#4594)

3f3053b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

beastoin commented Feb 7, 2026

View reviewed changes

beastoin merged commit 3989e8d into main Feb 7, 2026
1 check passed

beastoin deleted the fix/daily-summary-race-condition-4594 branch February 7, 2026 07:19

beastoin mentioned this pull request Feb 7, 2026

Duplicate daily recap notifications due to race condition in cron job #4594

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent duplicate daily recap notifications via atomic SETNX lock (#4594)#4646

fix: prevent duplicate daily recap notifications via atomic SETNX lock (#4594)#4646
beastoin merged 5 commits intomainfrom
fix/daily-summary-race-condition-4594

beastoin commented Feb 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Uh oh!

beastoin commented Feb 7, 2026

Uh oh!

beastoin commented Feb 7, 2026 •

edited

Loading

Uh oh!

beastoin left a comment

Uh oh!

Uh oh!

beastoin commented Feb 7, 2026

Uh oh!

beastoin commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beastoin commented Feb 7, 2026

Summary

Root cause

Fix

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented Feb 7, 2026

Uh oh!

beastoin commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integration Test Confirmation

Setup

Run 1 — Lock Acquisition

Run 2 — Duplicate Prevention (simulating overlapping cron)

Redis Verification

Conclusion

Uh oh!

beastoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

beastoin commented Feb 7, 2026

Uh oh!

beastoin commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

beastoin commented Feb 7, 2026 •

edited

Loading